pruning technique
- North America > United States (0.14)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Diecidue, Andrea, Barbano, Carlo Alberto, Fraternali, Piero, Fontaine, Mathieu, Tartaglione, Enzo
ABSTRACT Transformer-based models have become the state of the art across multiple domains, from natural language processing to machine listening, thanks to attention mechanisms. However, the attention layers require a large number of parameters and high-end hardware for both training and inference. We propose a novel pruning technique targeted explicitly at the attention mechanism, where we decouple the pruning of the four layers in the attention block, namely: query, keys, values and outputs' projection matrices. We also investigate pruning strategies to prune along the head and channel dimensions, and compare the performance of the Audio Spectrogram Transformer (AST) [1] model under different pruning scenarios. Our results show that even by pruning 50% of the attention parameters we incur in performance degradation of less than 1%.
- North America > United States (0.14)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Scalable Interconnect Learning in Boolean Networks
Kresse, Fabian, Yu, Emily, Lampert, Christoph H.
Learned Differentiable Boolean Logic Networks (DBNs) already deliver efficient inference on resource-constrained hardware. We extend them with a trainable, differentiable interconnect whose parameter count remains constant as input width grows, allowing DBNs to scale to far wider layers than earlier learnable-interconnect designs while preserving their advantageous accuracy. To further reduce model size, we propose two complementary pruning stages: an SAT-based logic equivalence pass that removes redundant gates without affecting performance, and a similarity-based, data-driven pass that outperforms a magnitude-style greedy baseline and offers a superior compression-accuracy trade-off.
- North America > United States (0.14)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
Energy-Aware LLMs: A step towards sustainable AI for downstream applications
Tran, Nguyen Phuc, Jaumard, Brigitte, Delgado, Oscar
Advanced Large Language Models (LLMs) have revolutionized various fields, including communication networks, sparking an innovation wave that has led to new applications and services, and significantly enhanced solution schemes. Despite all these impressive developments, most LLMs typically require huge computational resources, resulting in terribly high energy consumption. Thus, this research study proposes an end-to-end pipeline that investigates the trade-off between energy efficiency and model performance for an LLM during fault ticket analysis in communication networks. It further evaluates the pipeline performance using two real-world datasets for the tasks of root cause analysis and response feedback in a communication network. Our results show that an appropriate combination of quantization and pruning techniques is able to reduce energy consumption while significantly improving model performance.
- North America > Canada > Quebec > Montreal (0.15)
- Europe > France > Île-de-France > Paris > Paris (0.04)
EvoP: Robust LLM Inference via Evolutionary Pruning
Wu, Shangyu, Du, Hongchao, Xiong, Ying, Chen, Shuai, Kuo, Tei-wei, Guan, Nan, Xue, Chun Jason
Large Language Models (LLMs) have achieved remarkable success in natural language processing tasks, but their massive size and computational demands hinder their deployment in resource-constrained environments. Existing structured pruning methods address this issue by removing redundant structures (e.g., elements, channels, layers) from the model. However, these methods employ a heuristic pruning strategy, which leads to suboptimal performance. Besides, they also ignore the data characteristics when pruning the model. To overcome these limitations, we propose EvoP, an evolutionary pruning framework for robust LLM inference. EvoP first presents a cluster-based calibration dataset sampling (CCDS) strategy for creating a more diverse calibration dataset. EvoP then introduces an evolutionary pruning pattern searching (EPPS) method to find the optimal pruning pattern. Compared to existing structured pruning techniques, EvoP achieves the best performance while maintaining the best efficiency. Experiments across different LLMs and different downstream tasks validate the effectiveness of the proposed EvoP, making it a practical and scalable solution for deploying LLMs in real-world applications.
- Europe > Austria > Vienna (0.15)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- (9 more...)
Review for NeurIPS paper: HYDRA: Pruning Adversarially Robust Neural Networks
Weaknesses: - It is not clear Hydra improves on adversarial attacks. It looks like test accuracy (benign) correlates with the adversarial accuracy (see Table:1). This is also observed by authors indirectly L217: Our results confirm that the compressed networks show similar trend as non-compressed nets with these attacks . It looks like as long as models are compressed properly the resulting models seem to be robust similar to dense networks. Therefore it is important to evaluate some SOTA sparse networks on compare them with HYDRA.